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Abstract — Benes networks are constructed with simple switch 
modules and have many advantages, including small latency and 
requiring only an almost linear number of switch modules. As 
circuit-switches, Benes networks are rearrangeably non-blocking, 
which implies that they are full-throughput as packet switches, 
with suitable routing. 

Routing in Benes networks can be done by time-sharing per- 
mutations. However, this approach requires centralized control 
of the switch modules and statistical knowledge of the traffic 
arrivals. We propose a backpressure-based routing scheme for 
Benes networks, combined with end-to-end congestion control. 
This approach achieves the maximal utility of the network and 
requires only four queues per module, independently of the size 
of the network. 

Index Terms — Benes Network, Dynamic Control, Stochastic 
Network Optimization, Queueing 



I. Introduction 

Data centers have gradually become one of our most impor- 
tant computing resources. For instance, search engines, web 
emails such as Gmail and Hotmail, social network websites 
such as Facebook, and data processing applications such 
as Hadoop are provided by data centers. Consequently, the 
networking of servers and resource allocation in data centers 
have become important problems. 

We develop a networking solution, a Benes packet network, 
which consists of a Benes architecture, a flow utility max- 
imization mechanism, and a backpressure-based scheduling 
algorithm. Specifically, we propose interconnecting the data 
center servers using a Benes network built with simple com- 
modity switch modules. We formulate the resource allocation 
objective as a network flow utility maximization problem 
to guarantee a fair share of the network resources. Lastly, 
we develop a low-complexity backpressure-based scheduling 
algorithm, called Grouped-Backpressure (G-BP), to achieve 
the optimal system performance. The G-BP algorithm is 
provably optimal and automatically handles changing traffic. 
Our approach only requires each switch module to maintain 
four queues, independently of the network size, and hence can 
easily be implemented in practice. 

Many papers explore networking solutions for data centers. 
0] proposes using a random graph based approach to enable 
incremental network growth for data centers. proposes a 
network architecture based on Clos network and random traffic 
splitting. [ 3 1 develops a hierarchical network structure for data 
centers. [4| uses the preferential attachment approach to design 
network topologies for data centers. [5| proposes a fat-tree 
based network architecture. [6| develops a MapReduce-like 
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system based on a cube-like architecture to exploit the in- 
network aggregation possibilities. [7| designs optical networks 
for data centers. However, we note that the aforementioned 
works mostly focus on designing the network architecture 
and achieving uniform load balancing. Hence, the proposed 
solutions do not immediately apply to problems where dif- 
ferent flows have different service requirements. Moreover, 
the solutions developed in the above works lack system 
performance guarantees. 

In this work, we aim at obtaining a network solution that 
combines practicality, generality, provable optimality, and low 
complexity. Specifically, we propose interconnecting the data 
center servers by a Benes network. As circuit-switches, Benes 
networks are known to be rearrangeably non-blocking and can 
easily be built with only an almost linear number of simple 
switch modules in the network size |8|, |9|. Thus, adopting the 
Benes network architecture not only guarantees high system 
throughput and low end-to-end packet delay (if routing and 
scheduling are done properly), but also eliminates the need 
for employing expensive switch devices whose cost does not 
scale easily as the data center size increases. Under the Benes 
network architecture, we establish a mathematical formulation 
for determining the allocation of network resources to cope 
with the heterogeneity of the data traffic service requirements. 
Our formulation leverages the network utility maximization 
framework ifTOl . ifTTl . which has been proven to be a general 
mechanism for handling network resource allocation problems. 

Finally, to reap the full benefits of the Benes network 
architecture and the resource allocation framework in a prac- 
tical manner, we develop a routing and scheduling algorithm 
that has provable system performance guarantees and a very 
low implementation complexity. Our algorithm is constructed 
based on the recently developed backpressure network opti- 
mization technique 1121 . combined with an end-to-end con- 
gestion control mechanism. However, different from previous 
backpressure algorithms, e.g., 1131 . Ifl4l . fl5l . which either 
require that the number of queues each switch module has to 
maintain is proportional to the network size, or only apply to 
problems with single-path routing, our algorithm uses a novel 
traffic grouping idea and allows us to use only four queues 
per switch module regardless of the network size. Moreover, 
our algorithm automatically explores all the possible routes 
to fully utilize network capacity. These distinct features make 
our algorithm very suitable for practical implementation. 

This paper is organized as follows. In Section [II] we present 
the system model and state our objective. In Section III we 
set up the notations. Then, we explain the intuition of our 
design approach and describe all the needed components of 
the Group-Backpressure (G-BP) algorithm in Section 



IV 



We 



present the G-BP algorithm and analyze its performance in 
Section [V] Simulation results are presented in Section VI We 
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conclude the paper in Section |VII| 

II. System model 

We consider the system shown in Fig. [T] where a Benes 
network connects a set of communicating servers. In this 
system, each rectangle is a switch module having two input and 
two output links. Each link has a capacity of 1 packet/slot. The 
smaller nodes are the servers. Traffic flows are generated from 
the servers on the left, called input servers, and are going to the 
servers on the right, called output servers. Q We assume that 
the system operates in slotted time, i.e., t € {0, 1, 2, ...}. The 
discrete time assumption is for convenience of the analysis. 
The actual network would operate in an asynchronous way 
with variable length packets. 

S C\ C n C-2n-\T> 




Fig. 1. A 16 X 16 Benes network connecting 16 input servers S to 16 
output servers T>. The rectangles are the switch modules that form the Benes 
network. Hi refers to row i of the Benes network and Cj refers to column j. 

A. Admission control and flow utility 

We label the flows according to their source and destination 
servers. Specifically, we call the traffic entering from input 
server s and going to output server d the (s, d) flow. We 
use A s d(t) to denote the number of (s,d) packets generated 
at input server s at time t. We assume that for every (s, d) 
flow, the random variables {A s d(t),t = 0,1,...} are i.i.d. and 
have mean X s d = E[A s d(t)l. Our results can be extended to 
incorporate much more general arrival processes, e.g., Makov- 
modulated arrivals. We also assume that there exists some 
finite constant A max such that < A s d{t) < A roax for all 
(s, d) and for all time t. 

In every time slot t, each input server performs admission 
control to determine how many packets to inject into the 
network. We denote < R s d(t) < A s d(t) the number of 
(s, d) flow packets actually admitted by input server s for 
transmission at time t. We then denote the average rate of the 
(s, d) flow packets by r s d, defined as: ^ 

1 T_1 

°° t=0 

Each (s, d) flow is associated with a utility function U s d(r s d), 
which is concave increasing in its average rate r s d- We assume 
that the utility functions have finite first derivatives and denote 
(3 their maximum value, i.e., 

/3 = max[/^(0). (2) 

sd 

'it is straightforward to extend our results to include bi-directional traffic 
flows. 

throughout this paper, we assume that all the limits exist. 



B. Stability and objective 

In this paper, we say that a queue with queue size process 
{Q{t) > 0, t = 0, 1, 2, ...} is stable if: 

1 T_1 

limsup- 5^E[Q(t)] < oo. (3) 



t=0 



Then, we say that a network is stable if all the queues in the 
network are stable, and call a routing and scheduling policy 
that ensures network stability a stabilizing policy. We use A„ 
to denote the capacity region of a 2™ x 2™ Benes network, being 
the set of arrival vectors under which there exist stabilizing 
routing and scheduling policies. 

Depending on the routing and scheduling algorithm, the net- 
work queueing structure can be quite different. Our objective 
is to find a low-implementation-complexity stabilizing routing 
and scheduling policy that maximizes the aggregate flow utility 
of the network, i.e., 



max : U(r) = ^ U sd (r sd ) 



(4) 



s.t. 



r G A r , 



where r = (r 3 d, V(s, d)) with r s d being the average rate of 
the (s, d) flow defined in ([I]). We denote by r opt the rate vector 
that achieves the optimal utility over all stabilizing policies. 

Note that our formulation Q is indeed very general. The 
heterogeneity of traffic flow service requirements can easily be 
taken into account by designing appropriate utility functions. 
Also note that, although our system model is similar to those 
in |[T3"1 . in our paper, the queueing structure is also part of the 
algorithm design problem. 

C. Discussion 

The problem of optimal routing and scheduling in a Benes 
network can be solved by using the well-known backpressure 
routing algorithm [12|. However, this approach requires each 
node to maintain a separate queue for each output server. Thus, 
each node has to maintain 2™ queues, which is not practical 
when the size of the Benes network (number of servers) 
increases. Recent works [14| and [15| propose backpressure- 
based algorithms that use much fewer queues. However, the 
algorithm in lfl4l requires nodes to maintain a separate queue 
for each cluster of the network nodes and needs a pre- 
defined clustering algorithm, whereas the method in lfl5l is 
designed for single-path routing. Below, we develop a novel 
low-complexity approach called Grouped-Backpressure (G- 
BP). Our approach allows us to use only four queues per 
node regardless of the network size. 

III. Benes network structure and labeling 

In this section, we explain the structure of Benes networks 
and set up our notations. 

A. Benes network construction 

We first explain how a 2" x 2™ Benes network is constructed 
|8| |9|: Start with a basic 2x2 Benes network as in Fig. |2(a) 
Then, construct a 2™ x 2™ Benes network as follows: 

(Step I)-Concatenation: Vertically concatenate two 2™" 1 x 
2™ _1 Benes networks. Call them the upper subnetwork and 
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the lower subnetwork, e.g., 7773 and 7774 in Fig. |2(b)| Then, 
horizontally place two columns of 2™ _1 basic 2x2 modules, 
one on each side of the concatenated subnetworks. Call the 
modules on the left of the concatenated subnetworks the input 
switch modules, e.g., mi and mi, and the modules on the right 
the output switch modules, e.g., 777,5 and mg. 

(Step II)-Connect input modules: Connect the upper output 
link of the input module in row k to the fc th input link of the 
upper subnetwork, and connect its lower output link to the fc th 
input link of the lower subnetwork. 

(Step III)-Connect output modules: Connect the fc th output 
link of the upper subnetwork to the upper input link of the fc th 
output module, i.e., the output module in row k, and connect 
the fc th output link of the lower subnetwork to the lower input 
link of the fc th output module. 



(a) A basic 2 X 
Benes network 





(b) A 4 X 4 Benes network 











2 „_! x 2 „_i 





2 „_1 x 2 „_1 



Input switch output switch 

(c) A general 2™ X 2™ Benes network 
Fig. 2. The structure of Benes networks. 

B. Labeling a Benes network with servers 

We first specify how we label a 2™ x 2" Benes network. 
We denote B„ the 2" x 2" Benes network (excluding the input 
and output servers). Then, we divide the Benes network into 
rows and columns. In a 2™ x 2™ Benes network, there are 
2"" 1 rows, denoted by {11^1 = 1, 2"" 1 }. We then denote 
the 2n — 1 columns by {Cj, j = 1, 2n — 1}. For any node 
to in the Benes network, we use i m and j m to denote its 
row number and column number. For the input and output 
servers connecting to the Benes network, we label them using 
their row numbers. The set of input servers are denoted by 
S = {1,2,..., 2™} and the set of output servers are denoted 
by V = {1, 2, 2"}. Note that both S and V have 2 n rows 
(the small squares in Fig. [T|. As in Section III-A we call the 
nodes in C\ the input switch modules and the nodes in Cm-\ 
the output switch modules. 

From the construction rules of Benes networks and the way 
the servers are connected to the Benes network, we see that 
for every node m G B n , there are two nodes in column C Jm +i 
to which it connects (for a node to G €2,1-1, it connects to two 



,-i I m 'u = ™K 

t _l I 772; = m} . 

to denote the input servers 



nodes in V). We denote the node with a smaller row number 
by m u and the other one by to/. There are also two nodes in 
column Cj m _j that connect to 777 (if 777 G C\, there are two 
nodes in S connecting to it). We denote these two nodes by 
Ai m - Among these nodes, those that have m as their next 
hop with a smaller row number are denoted by A4™ n , and the 
other nodes having to as their next hop node with a larger row 
number are denoted by M l m , i.e., 

AC = {to' g C, 
M l m = {to/ g C t 

For m G C\, we simply use A4„ 
that connect to it. For each input server s G S, we use m(s) 
to denote the node in C\ it connects to. We call the servers 
in rows 1 to 2 n ~ x the upper division servers, and call all the 
other servers the lower division servers. We then call a flow 
whose destination is an upper division server an upper division 
flow. Otherwise it is a lower division flow. 

For a 2" x 2™ Benes network B„, we define the nodes in 
C n as the partition nodes. From the construction rules of the 
Benes networks, we first have the following observation: 

Fact 1: For a 2" x 2™ Benes network, its partition nodes 
coincide with the partition nodes of its two 2™ -1 x 2 n ~ 1 
subnetworks. 

Below, we denote the upper outgoing link of a switch 
module by link a and the lower outgoing link by link b (see 
Fig 



2(a)i 



We use Of n to denote the set of output servers 
that can be reached by traversing the upper outgoing link a of 
node to and use O m to denote the set of output servers that can 
be reached by traversing link b. We then have the following 
simple lemma, which can be seen from the construction rules 
of Benes networks. 

Lemma 1: (a) Starting from any partition node to G C n , 
there is a unique path to any output server d € T>. (b) For 
every node to G C n +i , I > 0, we have: 

{n m 2 n 1 + 1, (k Ti 



O a 



O" 



)2 n ~ 



1. 



2 



12™-'}, 

n + 1)2"-'}, 



(5) 
(6) 



where n m = (i m — 1) mod 2 l . 



IV. Intuition and key components of 
Grouped-Backpressure 

In this section, we present the idea and all the needed 
components for our Grouped-Backpressure algorithm (G-BP), 
which will be used to achieve the optimal flow utility under 
the Benes network architecture. 

A. The idea 

The idea of Grouped-Backpressure is to "group" all the 
flows into two mixed flows, the upper division flow and the 
lower division flow. Then, we construct a scheme for routing 
the mixed traffic in the first half of the network based on 
a fictitious reference system. This approach allows us to use 
very few queues per node. However, due to this traffic mixing, 
we lose the ability to control each individual flow inside the 
network. Hence, the flows can be routed arbitrarily inside 
the network, in which case certain nodes may receive more 
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traffic than they can handle and become unstable. In order to 
resolve this problem, we impose a special queueing structure 
at each node to ensure that routing and scheduling is done in a 
fully symmetric manner. With this approach, we guarantee that 
every flow is split into sub-flows with equal rates and routed 
through the partition nodes. In the second half of the network, 
by Lemma [T] each packet will traverse a unique path to its 
destination. Hence, we will do a "free-flow" routing. Using 
the symmetric structure of the Benes network, we then show 
that the G-BP algorithm can stabilize the network and achieve 
maximum utility. Our approach is demonstrated in Fig. [3] 




Fig. 3. The pictorial illustration of the G-BP algorithm. The first half of the 
network is controlled by a backpressure-like algorithm based on a fictitious 
reference system. The second half of the network uses a "free-flow" scheme 
for packet delivery. 

B. A fictitious reference system 

In order to guide the routing and scheduling of the grouped 
traffic, we create a fictitious reference system as follows. 

1) Remove all the nodes in columns n + 1 to In — 1. 

2) Create two fictitious destination nodes D\ and D2, 
where D\ represents the common destination for the 
upper division flows and D2 represents the common 
destination for the lower division flows. 

3) Connect each partition node, i.e., a node in C n , to Di 
with a link of capacity 1 packet/slot and to D2 with a 
link of capacity 1 packet/slot. 

An example of the fictitious system is shown in Fig. [4] for a 
16 x 16 Benes network. The fictitious system will be used as 
a reference system to guide us on serving the grouped traffic. 
Specifically, we will design a backpressure-based algorithm for 
the fictitious system, and use the exact same actions to control 
the nodes in columns 1 to n — 1 in the physical system. This 
approach has the useful property that it allows us to use only 
4 queues per node. 

C. Queue structure and load balancing 

Since in the reference system we only have 2 destinations 
and do not distinguish flows inside the network, if routing is 
not done carefully, it can happen that most of the traffic going 
to an output port is routed to a single partition node and causes 
instability of the node. In order to resolve this issue, we impose 
a special queueing structure on the switch nodes to balance 
all the traffic, so that each flow is equally split among all 
possible paths and routed to the partition nodes. Doing so, we 
guarantee that as long as the traffic rate is supportable (will 
be explained later), no node will be overwhelmed. 




Fig. 4. The fictitious reference system for the 16 X 16 Benes network with 
servers. 

We now specify our queueing structure for both the fictitious 
system and the physical system: 

1 ) Input servers in both systems: For each input server s G 
S, we maintain 2 queues per node as follows: 

• Qs(t): number of upper division flow packets stored at 
input server s; 

• Qs(t): number of lower division flow packets stored at 
input server s. 

These two queues evolve according to the following dynamics: 

QT(t + l) = [QT(t)-»T Ms) (t)] + + R[(t). (7) 

Here the notation [x] + = max[x, 0] and the notation T G 
il s = {U,L} denotes the "type" of the traffic at the input 
servers, and Rj(t) denotes the aggregate arrival to Qj(t), 
i.e., 

2) Switch modules in columns 1 to n — 1 in both systems: 
We maintain 4 queues per node as follows: 

. Q^(t): number of upper division flow packets that will 

be routed through m„; 
. Q\^(t): number of upper division flow packets that will 

be routed through mf, 

• Q^(t): number of lower division flow packets that will 
be routed through m u ; 

• Qm{t)' number of lower division flow packets that will 
be routed through mi. 

Now define Q B = {UU, UL, LU, LL} and use T G fl B to 
denote the type of these queues at the switch nodes. We see 
that the queues evolve according to the following dynamics: 

Q£(* + l) (9) 

< [Qm (*) - /< ro( r) (*)] + + RJn if) , v r g n B . 

Here m(T) is the next hop node corresponding to the type T 
traffic, i.e., m(T) = m u for T G {UU, LU } and m(T) = mi 
otherwise. R^(t) is the aggregate arrivals to Q^it), given by: 

R^(t) = X m {t)R" n {t), R^(t) = (1 - X m (f))i&(t), (10) 

R™(t) = Y m (t)RUt), R%(t) = (1 - Y m (t))R\ n {t), (ID 
where R^ n {t) and R}r m {t) are the aggregate upper and lower 
division arrivals to node m, given by: 
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R L m(t)= E ^,m(t)+ E M™',,nW- d3) 

The variables X m (t) and Ym(A) are i.i.d. Bernoulli variables 
taking values or 1 with equal probabilities, introduced for 
ensuring an equal division of the flow rates. Note that we have 
used inequality in ([9j. This is because the actual packet arrivals 
to Qj n (A) may be less than Rj n (A) as the upstream nodes may 
not have enough packets to fulfill the allocated transmission 
rates. Our queueing structure and traffic splitting scheme are 
demonstrated in Fig. [5] 



R L At>- 



Node s 



Q"0) 



Q L .(t) 



R L Aty- 



Node s' 



Node m 



Y,„(t) = l Q^(t) 




!•„,(() =0 



R L m (t) 



Fig. 5. The queueing structure and traffic splitting method. 

3) Partition nodes in the fictitious system: Each node m € 
C n maintains only two queues Q^{t) and Q^ 2 (A) with the 
following dynamics: 

Qm (*+!)< [Qm (*) - Mm,^ (*)] + + ^ (*)■ (14) 

Here R% (t) = R m (t) and R% [t) = R}Jt) are the aggregate 
arrivals defined in ([12} and fP?} . 

4) Nodes in columns n to In — 1 in f/;e physical system: 
Each node m 6 Ll|™~ Cj maintains two First-In-First-Out 
(FIFO) queues Q m (t) and Q h m (t), one for the upper output 
link a and the other for the lower output link b (see Fig. 



2(a) i. The arrivals are placed into the queues according to 
their destinations, i.e., 



Q b m (t 

Here 



1 



[Q a m(t) 
1)= [<&(*) -A* 



.(*)]' 
,(*)] H 



E Mm', m (*),(15) 



E 



/x* (A).(16) 



Mm',m(<) = Es EdeOg, Am',m(*)» where Mm'.rnW 

denotes the actual number of flow (s, d) packets sent from 
to' to to at time t, and /4<, m (*) = E s Ede©*, Mm',m(*) 
denotes the number of packets that need to traverse the lower 
outgoing link b to their destinations. 

Notice that in both systems, each partition node only 
maintains two queues and does not further split the traffic. 
This is because in the fictitious system, the next hop nodes 
of a partition node are D\ and D 2 , whereas in the physical 
system, the flow (s,d) packets at the partition nodes will be 
delivered to output server d following a unique path according 
to Lemma Q] 

We now show that under the special structure of the Benes 
network, our queueing structure and traffic splitting scheme 
generate a balanced routing across the network. This is sum- 
marized in the following lemma, where, for T € 51s, we use 
~Pm m(T) to denote the time average transmission rates of the 



type T traffic from to to m(T). Specifically, 

T-l 

.,m(T) WI- 



TT 7 " 

rm,m(T) 



1 



lim 

T^oo T ^ 



Here ff^ m iq-\ (*) denotes the actual number of type T packets 
sent over the link [to, to(T)] at time t. Similarly, we use jS™ 
to denote the average rate of the flow (s,d) packets going 
through a node to, i.e., 

T-l 



a*;, 



sd A 



lim ^E E ^n, mt ,W 



LL Sd 



(*)], 



t=0 



where m ^ (t) and jlf^ mj (A) denote the actual numbers of 
flow (s, d) packets sent from node m to node m u and node 
to; at time t, respectively. 

Lemma 2: If the fictitious network is stable, then, 

(a) For every node to € U™"-,^-, 

7i uu =Ti UL TZ LU =7i LL (17) 

(b) The average rate of any (s, (A) flow packets going through 
any partition node to G C n satisfies /J™ = r s d/2 n ^ 1 . 
Proof: See Appendix A. ■ 

Z). r/ze arrival admission queue 

Since the arrivals to the network are dynamic, in order 
to perform packet admission in a fair manner, we introduce 
an auxiliary variables j s d(t) an d create the following virtual 
admission queue for every flow (s,d): 

H sd (t+1)= [H sd (t)- R sd (t)] + + j sd (t). (18) 

Intuitively, J s d(t) indicates how many flow (s,d) packets 
should have been admitted into the network. However, due 
to the randomness of the arrivals, this may not be feasible at 
every time A. Hence, the admission queue H sc [(t) is created to 
ensure that in the long run, the admitted packets have a rate 
that is no smaller than the rate they should have got. 

E. The output regulation queue 

Here we specify the last component needed for our algo- 
rithm. Note that the above subsections have been dealing with 
reducing the number of queues per node and balancing the 
traffic inside the Benes network. In order to guarantee stability 
of the network, one also needs to ensure that the total traffic 
going to any output port of the Benes network does not exceed 
its capacity. To do so, we create the following regulation queue 
for each output port d € {1, ...,2 n } (or equivalently, output 
server d): 

i + 



Qd(t + 1) = [q d (t) - (1 - V)] + E R >*W- 



(19) 



That is, the input to this queue are all the admitted packets 
destined for output port d, and the service rate of the queue is 
1 — ?/ for some small r\ > for all time. The intuition here is 
that if these virtual queues are stable, then the average traffic 
rate for any output port is no more than 1 — 77. The reason 
we have a small 77 "slack" is to ensure queue stability for the 
nodes in columns n to 2n — 1 in the physical network. 

V. The Grouped-Backpressure algorithm (G-BP) 

In this section, we present the construction of the G-BP 
algorithm and its performance. 
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A. Constructing G-BP 

For notation purposes, we first define the aggregate network 
queue vector of the fictitious network as follows: 

Z(t) = (Qj(t), Vs,Ten s , Ql(t),Vm e u^.Teft^ 

(*), Vm G Cn, H sd (t), V(s,d), q d (t),Vd). 
Then, we define the following Lyapunov function: 



Here B is a constant given by: 

i[2"(10n-2)+AL x (2 3 - 1 



B 



(23) 



1 



s,re!J, 



E 



E 

+2 E E^)] 2 + *E^(')N 

meC„ i=l,2 s,d 

Now define a Lyapunov drift as follows: 



(20) 



*Em*)] 2 



A(t) = E 



L(t + 1)-L(t) | Z(t) 



(21) 



Using the facts that < A s d{t) < A max and that all the link 
capacities in the network are bounded, we get the following 
lemma for the drift. In the lemma, the parameter V > 1 is a 
control parameter offered by the algorithm to control the flow 
utility performance. 

Lemma 3: Under any control policy, the following property 
holds for the drift at any time t: 



A(t) - VE 



< B — 



d 



l^u sd { lsd {t)) I Z(t) 



(22) 



^ q d (t)(l - V ) - X 



rn£C n ., 



s.d 



E 



fJ"m,Dt{t) I 

VU sd { lsd {t)) - H sd {t) lsd {t) | Z(t) 
E R sd (t)[H sd (t)-q d (t) -Q u s {t)} 
E E E R >*® [ H sd(t) - q d (t) - Q L s (t)] 



Z(t) 



and the expectation is taken over the random arrivals as well 
as the potential randomness in the actions. 

Proof: See Appendix B. ■ 

Note that since the Benes network size is 0(2"), the n 
value is only logarithmic in the network size. Hence, B is 
indeed only polynomial in the network size. Based on the 
above lemma, we now describe our algorithm for the physical 
system. In the algorithm, we will operate the nodes in S and 
U"~^Cj in the physical system exactly as we operate them 
in the fictitious system. For these nodes, the actions will be 
chosen in every time slot to minimize the right-hand-side 
(RHS) of the drift expression ([22}. For all the modules in 
columns n to 2n — 1, we simply do a free-flow routing. 

Grouped-Backpressure (G-BP) At every time slot t, ob- 
serve A(t) and Z(t), and perform the following: 

. Auxiliary Variable Selection: For every (s, d) flow, 
choose 7 s d(t) to solve: 

max : VU sd { lsd {t)) - H sd {t) lsd {t) (24) 

s.t. < lsd {t) < A 

max • 

• Admission Control: For every input server s: If d < 

2 n -\ choose R sd (t) = A sd (t) if H sd (t) - q d {t) ~ 
Q u s {t) > 0; else choose R sd {t) = 0. If d > 2 n ~ 1 , choose 
Rsd(t) = A sd (t) if H sd {t) - q d {t) - Q L s (t) > 0; else 
choose R sd (t) = 0. 
Routing and Scheduling: 

- For any node m G \Jj~{Cj U S: define the follow- 



E 



d<2 



Ve 



f*s,m(s) 



Z(t) 
Z(t) 



(*)[#(*) -^wW 



Q L ^ (s) (t)] | Z(t) 



meCn-! re{uu, ul> 

E E 

mec„_! re{w, ll> 



E E ^,m(T)(*)K(*) ~ QI\tM I 



E E E 

^eu7- 1 2 c ^ re{uu, ul> 



E E E 

leujZiC, re{w, ll} 



,r 



l( 70(«)[Ql(t)-n' 



->UU 



(T)W 



-2«m(r)(*)] I z (*) 

Mm,m(T) 



Q™ (r >(*)] i z w 



ing weights for the outgoing link [m, m, 



.(*) = 
.(*) = 



Qm(f)-QLW,o 



?L(*) = 



where (f) and Q m (t) are defined as: 

ZW+'AW 3m<n-2, 

0ml (*) j m = n-l, 

^ U „(*) + |Qm„(*) J™<n-2, 

Qm 2 „(*) j m = n-l. 

Then, we choose the service rates 



(25) 
(26) 

(27) 

(28) 
(t) and 



Mm m„ W f° r link [m, m u ] to solve: 
max : (t) jyU + ^LU ^ 



m.m u ' r" 

UU , LU < i UU LU 
S.I. r*m,m„ ~r ^m,m,, — ^> Mm, m*. 5 Mm, 



,UL 



(29) 

m . in ,., I'm. m, u ^- { ^ J* 

To solve for ^ L m ,(t) and Mm,mj(*)> we replace 
QW(t) and Qj£(t) with Q" L (t) and Q^(t) in @ 
and ( |26| . Also, we replace to u and with m/ and 
Z?2 in ( |27[i and ( |28] i. If m = s G 5, we simply 
replace Q?„ u (t) and Q^ L (t) with Q u (t) and Q^(^ 
in |25]> and (26i, and replace m u by m(s) in p7| ) 
and^28). 

- For every node m G U^™^ 1 ^ : Each module serves 
each FIFO queue for each outgoing link according 

»,(*) 



to ( 15 i and ( 16 1 with fi Tl 



.(*) 



1 for 
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all time. 

• Queue Updates: In the fictitious system, choose the 
service rates [i m .D x (t) and /i m .£> 2 (t) to solve: 

max : Q^ 1 (t)^ Dl (t) + Q% 2 (*)pm,D 2 (*) (30) 

S.t. MmA(*))Mm,D,W € {0, 1}. 

Then, update all the queues in both the fictitious system 
and the physical system according to their dynamics. 
We note that G-BP only controls the first half of the physical 
system with the backpressure actions. All the nodes in columns 
n to 2n — 1 simply serve the flows with a "free-flow" manner, 
i.e., always serve the flows at the maximum rate. This is 
different from the usual backpressure algorithms that control 
all the queues in the network to ensure stability. 

B. Performance analysis 

In this section, we prove that G-BP achieves a near-optimal 
performance. To carry out our analysis, we first have the 
following theorem, which characterizes the capacity region of 
a Benes network. In the theorem, we use r = (r S( j, V(s, d)) 
to denote the vector of arrival rates, where r S( j represents the 
average rate of the (s,d) flow. 

Theorem 1: |8| |9| The capacity region of the Benes net- 
work B„ is given by: 



2" 

A n = {r\J2 r -d < 1, ^ r sd < 1, r sd > 0, Vs, d}.0 

-l 



2 

E 

d=l 



We now present the performance results of the G-BP 
algorithm. Recall that f3 is defined in Q to be the maximum 
first derivative among all utility functions, and that r opt € A n 
denotes the optimal solution to the flow utility maximization 
problem. 

Theorem 2: Suppose both the fictitious network and the 
physical network are empty at time t = 0, i.e., all the queues 
are zero. Then, (i) Both the fictitious network and the physical 
network are stable under G-BP, and (ii) Denote r G BP the time 
average rate vector achieved by G-BP. We have: 

C/(r G " BP ) > C/(r opt ) -y- 2 n j3 V . (31) 

Proof: See Appendix C. ■ 
From (31 1, we see that the utility performance of G-BP 



can arbitrarily approach the optimal as we increase V and 
decrease r\. However, doing so will increase the average 
network delay. Hence, there is a natural tradeoff between the 
utility performance and the network delay. 

Note that though the performance results in Theorem [2] look 
similar to previous results in [13], the proof is indeed quite 
different. This is because in our case, we impose a special 
queueing structure on the network, and the second half of the 
network uses a free-flow routing. These two features make the 
analysis very different from the usual backpressure algorithms, 
under which each node maintains a separate queue for each 
flow, and all the network actions are based on the network 
queue sizes. 

C. Discussion on implementation 

We note that the G-BP algorithm can easily be implemented 
in a fully distributed manner. Specifically, one can maintain the 



virtual admission queues at the input servers and maintain the 
virtual output regulation queues at the output servers using 
counters, as shown in Fig. [6] With this arrangement, the 
auxiliary variable selection step can easily be done locally 
at the input servers, and the routing and scheduling step can 
easily be done by each node exchanging queue information 
only with its four neighbors. The admission control step 
requires the input servers to know the regulation queue sizes. 
This can be achieved by message passing the regulating queue 
sizes along the network using prioritized packets. Similarly, the 
update of the regulation queues requires the knowledge of the 
arrivals for the output port. This can be approximated by using 
the arrivals to the output servers as the input to the regulation 
queues. Though message passing and queue approximation 
may incur performance loss in practice, we will see in the 
simulation section that, the G-BP algorithm is indeed very 
robust and can still achieve near-optimal performance even 
under different message passing delays and regulation queue 
approximation. 

Finally, note that though we have described implementing 
our algorithm with actual data queue sizes. In practice, to 
further reduce network delay, we can also implement G-BP 
with counters to keep track of the queue processes that should 
have been generated for decision making, and admit slightly 
smaller arrival rates than G-BP. 



HJt) 



l,d(t)- 



ff«((t) Qd(t) 



•<„,.,(') : 



.l-i, 
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Fig. 6. Implementation of G-BP. The virtual admission queues are main- 
tained at the input servers, while the virtual regulation queues are maintained 
at the output servers. Message passing is used to send regulation queue 
information through the network for admission control. The regulation queues 
can use the local arrivals to the output servers as the input. 



VI. Simulation 

In this section, we present the simulation results of G-BP 
on a 2 4 x 2 4 size Benes network. For simplicity, we assume 
that A s d(t) — A max = 2 for all time. 

In the simulation, we assume that every flow has a utility 
function log(l + r s( j). In every time slot, each flow can 
admit 0, 1 or 2 packets. We simulate the system for V E 
{5, 10, 20, 50, 100} and 77 = 0.01. Each simulation is run for 
10 5 slots. To test the robustness of G-BP against the delay 
and sparsity in message passing and the regulation queue ap- 
proximation, we simulate four different cases, (i) The original 
G-BP algorithm, where the message passing delay is zero and 
the regulation queue is exact, (ii) The case when the input to 
the regulation queue qd(t) are the actual packet arrivals to the 
output server d (the service rate is still 1 — rj), and admission 
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control at time t uses q^{t — (2n — 1)) instead of qd(t). (iii) 
Similar to the second case, but admission control at time t 
uses qd(t — 5(2n— 1)). (iv) Similar to the second case, but the 
regulation queue information is only sent every 5(2n — 1) slots 
and has a delay of 5(2n— 1). That is, admission control at time 
t uses q d (t ) where t = max[([ 5(2t *_ 1) j - l)5(2n - 1), 1]. 



Delay 




- Ideal G-BP 
(2n-1)-delay 

-5(2n-1)-delay 

- 5(2n-1 )-delay & sparsity 




-0- Ideal G-BP 
-(2n-1)-delay 
-5(2n-1)-delay 
- 5(2n-1 )-delay & sparsity 



Fig. 7. The aggregate flow utility and average packet delay under G-BP. 
One can see that G-BP works very well even with message passing delay 
and regulation queue approximation. 

Fig. [7] shows the performance of the G-BP algorithm. Here 
the average delay (in number of slots) is computed using the 
set of packets that are delivered when the simulation ends. 
For all simulations, this set contains more than 99.9% of the 
total packets that enter the network. We see that as we increase 
the V value, the aggregate flow utility quickly converges to its 
optimal value. However, doing so also leads to a linear increase 
of the average packet delay. We also see from the figure 
that, G-BP is indeed very robust to the delay and sparsity 
in message passing, and regulation queue approximation. 

In Fig. [HJ we plot a recorded queue process of the network 
under G-BP for V — 10. In this case, we change each 
flow's utility function to w s d log(l + r s d) in the middle of the 
simulation, where w s d takes values 1,2 or 3 equally likely. 
We see that after the change, G-BP quickly adapts to the 
new utility functions and performs admission and routing 
accordingly. 



-Total queue size of the network 




Time 



10. 



Fig. 8. The total network queue size under G-BP with V 

Finally, we also evaluate the average packet delay as a func- 
tion of the network size, to see how the algorithm scales. As 
comparison, we also simulate an "enhanced" G-BP algorithm, 
in which we replace each queue value in the algorithm with 
the queue value plus the node's hop count to the destination, 
i.e., its column number. The idea is to create "bias" towards 



the packet destinations. This enhancement is similar to the 
EDRPC algorithm developed in [16]. We can see from Fig. 
[9]that the average packet delay under G-BP scales as 8(n 2 ). 
Since the Benes network size is 6(2"), this implies that the 
average delay grows only logarithmically in the network size. 




Fig. 9. Average delay as a function of the network size under V = 10. 

Note that in Fig. [9] we have plotted the average delay in 
number of slots. To get some physical understanding of the 
results, assume that each packet has 500 bytes and each link 
has a capacity of 1 Gbit/second, which are both quite common 
in practice. Then, every slot is 4 microseconds. Hence, we see 
that the average packet delay under Benes network with G-BP 
is roughly 1 millisecond when the network size is 128 x 128. 
This demonstrates the good delay performance of our network 
design approach. 

VII. Conclusion 

In this work, we develop a novel networking solution called 
Benes packet network, which consists of a Benes network built 
with simple commodity switches, a flow utility maximization 
mechanism, and a Grouped-Backpressure (G-BP) routing and 
scheduling algorithm. We show that this combination can 
achieve a near-optimal flow utility and ensure small end-to- 
end delay for the traffic flows. Our approach also only requires 
each switch module to maintain at most four queues regardless 
of the network size, and can easily be implemented in practice 
in a fully distributed manner. 

Appendix A - Proof of Lemma[2] 

Proof: (Lemma |2]i We first prove Part (a). From the 
queueing dynamic equation ([9l, we see that for any node 
m € UjllCj, the input rates into Q^(t) and Q^{t) are 
equal because of random splitting. Similarly, the input rates 
into Qm{t) an d are the same. Hence, if the fictitious 

network is stable, the output rates from these queues are equal 
to their input rates fl2l . Therefore ( 17 1 holds. 



Now we prove Part (b) by induction. First we see that it 
holds for any 4x4 Benes network. This is because if the 
fictitious network is stable, then the input switch modules split 
the incoming flows equally into the two partition nodes (see 
Fig.0. 



Now suppose the same is true for a 2™ 



2"" 1 Benes 

network, we want to show that it also holds for a 2™ x 2™ 
Benes network. 
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To see this, note from Fig.[2]that each 2" x 2™ Benes network 
consists of two 2™ _1 x 2™~ subnetworks, 2 n_1 input switch 
modules and 2 ,l_1 output switch modules. According to the 
structure of the Benes network, any input switch module has 
one link connecting to the upper 2 n_1 x 2™ _1 subnetwork and 
the other one connecting to the lower 2™ _1 x 2™~ 1 subnetwork. 
From Part (a), we see that half of a flow's rate will be routed 
through the upper subnetwork and the other half will be 
routed through the lower subnetwork. Now consider the upper 
subnetwork and view the flow traffic into this subnetwork as 
its own external input. Since this subnetwork is also stable, the 
flow's traffic will be equally split and routed via its partition 
nodes by induction. Since all the partition nodes coincide 
according to Fact[T[ we see that the lemma follows. ■ 
Appendix B - Proof of Lemma[3] 

Here we present the proof of Lemma [3] 

Proof: Squaring both sides of d7) and using the fact that 
for any real number x, {[x] + ) 2 < x , we get for every s £ S 
and T € O s that: 

[QT(t + l)] 2 < [Ql»] 2 + [Rj(t)f + [£ Ms) (t)} 2 (32) 

-2Ql(t)^l m{s) (t)-Rl(t)}. 

Now note that m£ to ( s) (*) < 1 and Rj(t) < 2™- 1 A max . 
Hence, if we define B l = 2(2™ + 2 3 ™- 2 A 2 nax ) and sum Q 
over s E S and T G £l s , we have: 

^ [Ql(t + l)} 2 < E [Qjit^ + B, 



sts,Ten 



seS.Ten 

ses,Ten 



Qi(*)Km W (*)-<(*)] 



Using a similar argument as above, we get the following: 

E iQl(t + i)?- E k£(*)] 2 

meUjj/Cj.r en B meu"zlCj,T en B 

<B 2 -2 E 3m (t) [nl MT) (t) - R r m (*)] . 

Here i? 2 = 20(n - 1)2™ _1 . Now repeat the above for all the 
other queues in the fictitious system, we will also get: 

E [Q%{t + i)\ 2 - E [Q™(*)] 2 

<b 3 -2 E Q^WI/w^W -■*#(*)], 

Et^+^-E^wi 2 

<B 4 -2^^(t)[^(t) -jsd(t)], 

s , d 

E^^+ir-E^wi 2 

d d 

< b 5 - 2 E % - n - E ^(*)]- 

d s 

Here B 3 = 5-2™, B 4 = 2 2n+1 ,4 2 nax and B 5 = 2™ + 2 3l M 2 lax . 

Summing all the resulting inequalities, multiplying both 
sides by | and taking expectations on both sides conditioning 
on Z(t), we obtain the following: 

A(i) < B (33) 



E E ^(*) E 

seS Ten, 

E E 

meujr/Cj Ten B 

-EE W E 

meC„ i=l,2 



»l m ( S )(t)-xT(t) I z(*) 

Mm,m(T)(*) _ ^m(*) I Z(t) 

/i^W-C'W I z(t) 

fl.d(t) - lsd(t) | Z(t) 



E%(*) E i-??-E^(*) i z ^ 



Here the constant B = h Yli=i 5 Bi> i- e -' 
B = -[2 n (10n - 2) + A 2 nax (2 3Tl ~ 1 + 2 2n+1 + 2 3n ) 



(34) 

Now by adding to both sides of f33| ) the term 

-V^[E sd Usd(lsd(t)) I Z(t)], we get: 



A(i) - V"E 



E EU7«l(*)) I Z(t) 



s.d, 



<b-J2 E Ql» E 



« Ten 



-VTE 



(35) 

Mm,m(T)(*) _ ^m(*) I ^(*) 

^aW-^W I z(t) 

E^(7^W) I Z(i) 



E E Q™W E 

leup^Cj Ten s 

- E E ( f ) E 
meC n i=l,2 



s.d 



s.d 



R*d(t) - i S d{t) I Z(t) 



- e Qd(m 1 - v - E 1 z w 

Lemma [3] then follows by rearranging the terms, and using the 
definitions of Rj(t), ii^(t) and (i) in equations (js), ( 12 1 
and ([13). ■ 

Appendix C - Proof of Theorem|2] 

In this section, we prove Theorem [2] We first present a 
lemma regarding queue stability and a theorem regarding rate 
allocation in a Benes network. Then, we use the two results 
to carry out our analysis. 

We first have the following lemma. 

Lemma 4: Let Q(t) > 0,( € {0,1,...} be a queueing 
process with the following dynamics: 

Q(t + 1) = max[Q(t) -1,0]+ R(t), (36) 
Suppose (i) < R(t) < A max for all t, and (ii) 
lim.T-j.oo y Y^t=o — ^■~ r l f° r < 77 < 1 with probability 
1 (w.p.l). Then, Q(t) is stable. 

Proof: See Appendix D. ■ 
To state the theorem needed for our analysis, we first 
define the notion of a stabilizing rate allocation profile for 
the fictitious system. In the definition, we use C to denote the 



to 



set of network links in the fictitious network, and use /i\? ni m2 
and [i\ ni m2 to denote the rates of the upper division flow 
traffic and the lower division flow traffic sent from node mi 
to node to 2 , respectively. 

Definition 1: (Stabilizing rate allocation profile) For an 
arrival rate vector r, a stabilizing rate allocation profile /x(r) = 
(Mmm'; Mm,m'' V [m, m'] G C) is a vector that satisfies the 
following: 

r s d<fJ^ Ms ), E rsd - £m*)> Vs e 5 ' (37) 



E 



d<2"~ 1 

/ ^ r^m' ,m 



^ u 



Vto g U? =1 Cj, 



(38) 



^.VmSU^Cj, (39) 

Mm,m' + /4n,m' < !> Mm,m'>/4n,m' > °> V I m ' m 'l e A ( 40 ) 
Mm,£>i = 0, < fl2 = 0, Vto G C n . (41) 
In the above definition, if m G C n , i.e., to is a partition node, 
then rn u = D\ and to/ = £>2- We now state the following 
theorem: 

Theorem 3: For every arrival rate vector r G A„, 
there exists a stabilizing rate allocation profile fi(r) = 
(/ i mm''A'rom" V [m, m'] G £) for the fictitious network that 
has the following property: 

Cm,m» = Mm,m,! Mm,m„ = Mm,TO(> Vto G U J -_ 1 C J -. (42) 

Proof: See Appendix E. ■ 
Now we prove Theorem [2] 

Proof: (Theorem [2]) (Part A-stability) We start by prov- 
ing network stability. Our proof has two parts. In part one, we 
show that the fictitious network is stable, which implies that 
the nodes in columns 1 to n — 1 of the physical network are 
stable. Then, we show that each individual node in columns 
n to 2n — 1 of the physical network is stable. 

• (Fictitious network) From the auxiliary variable selection 
step (|24| and the fact that the maximum first derivative of the 
utility functions is j3, we see that whenever H s d{t) > Vj3, 
G-BP will set j s d(t) — 0. Hence, using the fact that < 
lsd{t) < A max for all time, we have: 

0<H sd (t)<V/3 + A max , (43) 

for all (s, d) flows and for all time. 

Now consider the admission control step. We see that 
whenever Qg(t) > H s d{t), R s d{t) — for any upper division 
(s,d) flows. Similarly, whenever Qg(t) > H s d(t), R s d(t) = 
for any lower division (s, d) flows. Since for both QU (t) and 
Qg(t), there can be at most 2™ _1 ^4 max new packet arrivals in 
a single time slot, we have for every s G S that: 

QsW < ^ + (2 ,l - 1 + l)A max , (44) 
Q L s (t)<Vf3 + (2 n ~ 1 + l)A max . (45) 

Similarly, we also see that for every (s,d) flow, if qd{t) > 
H s d(t), then R sd (t) = 0. This together with (|43]) imply that: 

q d {t) <V[3 + (2 n + l)Anax- (46) 

Here the term 2 n A max is because in any time slot, there can 
be at most 2 n A max new packets entering q d (t). Hence, all the 
regulation queues are also stable. 



Now consider a node m G C\ and look at its upper division 
queues Q^ u (t) and Q^\{t). According to the routing and 
scheduling rules, in order for any of the two queues to receive 
new arrivals, there must exist a node s G M m such that: 

Qm(*) + Qw(*) < 2 <3s W- These together with 
pB) imply that: 



44 



and 



->uu 



(t)+Qrn(t) 



< 2{VP + (2"- 1 + l)A m „) + 2, Vm G C\. 

Here the last fudge factor 2 is because at any time t, there 
can be at most 2 new packet arrivals to node m. Similarly, we 
have for the lower division queues that: 



< 2{Vp + (2"- 1 + l)4nax) + 2, Vm G C\. 
With the above reasoning, one can show that for to G C2 , 



Ql u (t) + Q u m l (t) < 2 2 (y/3 + (2"- 1 + l)A maK ) + 2 2 + 

Qm (*) + Qm(*) < 22 (^ + ( 2 " _1 + !)Anax) + 2 2 4 

More generally, for every node to G U™~*Cj, we have: 



(47) 
(48) 



Q u m u (t) + Q u m L (t) < 2*» (F/3 + (2"- 1 + l)A max ) + ^ 2', 



QL u W + Q™W<2 Jm (^ + (2"- 1 

and for to G C„, we have: 

Q%(t)<2 n (Vp + (2 n - 



l)A n 



1=1 

3m 



1)A 



1)A 



{=1 



Q„ 2 W<2"(V/3+(2"- 

1=1 

This proves that the fictitious network and the nodes in 
columns 1 to n — 1 in the physical network are stable. 

• (Second half of the physical network) Now we show that 
the nodes in columns n to 2n — 1 of the physical network are 
stable. Recall that in this second half of the physical network, 
there are only two queues at each switch module m, i.e., 
Qf n (t) for the upper outgoing link a and Q b m (t) for the lower 
outgoing link b. 

We first consider a partition node to G C n . Since the 
fictitious network is stable, Lemma [2] shows that for every 
(s,d) flow, its rate is equally split among the 2 n_1 partition 
nodes. Using Lemma [T] we see that the total flow rate going 
through the upper output link a of to is given by: 

^r 8d /2 n - 1 <2 n - 1 (l-r ] )/2 n - l <l-T 1 . (49) 

Here the first inequality uses the fact that the regulation queues 
are stable, which implies ^ s r s d < 1 — 77. Thus, the total input 
rate into Q^t) is no more than 1 — to whereas the total output 
rate is 1 according to G-BP. This, together with Lemma|4]and 
the fact that the maximum number of packets that can enter 
Qf n (t) at any time is 2, imply that for any partition node 
to G C n , Qf n (t) is stable. Similarly, one can show that Q b m {t) 
is stable. 

Now we look at a node m G C„+i. Note that node m is 
connected by two partition nodes in C n . Using Lemma [T] and 
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Lemma [2] we see that the total rate going through the output 
link a of m is given by: 

2 E 

= 2 E E^/2- 1 

de[« TO 2™- 1 ,(« ro +|)2™- 1 ] » 

< 2"^ 1 (1 -? ? )/2™- 1 

< 1-ry. 

A similar argument will show that the total rate going through 
the output link b is also no more than 1 — 77, proving that 
the nodes in C n +\ are all stable. Now by repeatedly applying 
Lemma [T] Lemma [2j and the above reasoning, one can show 
that for any node m G U^"n Cj, the total input rates into 
Qf n {t) and Q h m (t) are both no more than 1 — 77 while the 
service rates are both 1. Hence, every node in the second half 
of the physical network is stable. This completes the proof of 
network stability. 

(Part B-utility) We now prove the flow utility performance 
pT| >. The analysis is done by first constructing a near-optimal 
solution to an optimization problem that captures the optimal 
utility. Then, we show that our algorithm achieves a similar 
utility performance by comparing the Lyapunov drift values. 

To start, we use A = (A sd) V (s, d)) to denote the random 
arrival vector and use {R (A - k) = (R { ^ ,k) , V (s, d)), k = 
1,2,...} to denote a sequence of admission vectors under 
arrival vector A. We then formulate the following optimization 
problem: 

^U sd ( 7sd ) (50) 



sd 



(51) 



E^ = E E [i>^' fc) ]<l, (52) 



,1 

E 



Tsd 



E ] 



E[£f* W } ] <!-»?, Vd,(53) 



E^1 A) 



l,p^ A) > 0, VA. 



(54) 
(55) 



(A) 

Here p). can be interpreted as the fraction of time the system 
uses the vector RS A ' k > when the arrival vector is A, and the 
expectation is taken over the random arrival vector A. 

For any given 77 value, denote T*(Vh r*(ri), and 
{i? (A ' fc) *(77),p[ A) *(?7)}^ 1 an optimal solution of po] and 



must also satisfies all the constraints ( |5T| - ( f54] > with 77 = 0. 



We create a solution 7(77), r(r)), {R 
for d50l) as follows: 



lsd{v) = (1 - vKM, r sd ( V ) = (1 - i?)r: d (0), (57) 



R (A > k \v) 



It can be verified that (7 , f(ry), {i? ' (?7)>pL (vtykLi) 



p[ A) *(v)(58) 



is a feasible solution for (pOl. Denote the value of 



under 



this solution as (f> v . Using the definition of f3, we see that: 

Usd(r* sd (Q)) < UsdirM) + PvrUO)- (59) 
Therefore, we have: 



< 



< 



PvJ2 r *sd(Q) 

sd 



!>*+^2". (60) 
Here the last step follows because r* (0) G A„, which implies 
J2sd r *sd( ) < 2"- © then implies that: 

^>C/(r op, )~^2". (61) 

Since r* (77) G A n , by Theorem[3] there exists a stabilizing rate 
allocation vector fj,(r*(r])) that satisfies (42 1 for all nodes in 



Ci to C„_i, which further implies that there exists a stationary 
and randomized routing and scheduling policy II that achieves 
the following for all m G Upfo d: 

E[/i s U m(s) (t)] = ^ M s)(r*(v)), 

= <m(.)( r *W)' 

E[C™„W] = 

E [/<„,(*)] = /< m ,(r*(»7)) ) 

E [<„,„(*)] - /4,m>* (»?)): 



E[< mi W] = /4, mj (r*fa)). 
Here we again assume that if rn G C„, then m„ = Di, 

mi = D 2 , ^m„(*) = fJ, m , Dl (t), Mm,m,(*) = V>m,D 2 (t), 

(Cm u (t) = 0, and/4 L mi (<)=0. 

Now since the G-BP algorithm is constructed by choosing 



the actions to minimize the RHS of ( 22 1, or equivalently ( 35 1, 



we see that ( 35 1 remains true if we plug in any alternate 
control actions. Thus, we plug in the solution (7* (17), r*(rf), 
{R( A ' k ^* (r])^^* (r])}f? =1 ), and the routing and scheduling 
policy n above, which guarantees: 



let the optimal value be <fi*. Since each utility function U sd {-) E[/iJ Ma) (t) - Rj(t)] > 0, Vs,T el! 



(62) 



is concave increasing, we see that 7*^(77) = r* d (vi) for all 
(s, d). We also see that r* (77) G A„, because ( |52| ) and (53 1 are 
sufficient conditions to guarantee that an arrival rate vector is 
in A„. Moreover, using an argument based on Caratheodory's 
theorem as in [12], one can show that </>q, i.e., the value of 



E[^, m(r) («) - RKt)] > 0, Vm G U^.Te fi fl ,(63) 
E[fi m , Di - > 0, Vm G C„,7 = 1,2, (64) 
E[R sd (t) - lsd {t)] >0,E[l-r]-J2Rad(t)] >0. (65) 



( p0| > at 77 = 0, provides an upper bound of the optimal utility 
of our problem, i.e., 

> U(r°*). (56) 
This is so because any feasible rate solution to our problem 



Thus, using the definition of 7* (77), r* (77), and (62i-(65l, we 
see that after plugging in the alternative actions, (|35|l becomes: 



A(t) - VE 



u ad (7f d BP (t)) I Zit) 



s,d 
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<B-VY,Usd{l* s M) 

s . d 

<B- VU{r opX ) + Vr)f32 n . (66) 

Here the last step follows from ([6T) . Taking expectations over 
Z(t) on both sides, summing (661 over t — 0, ...,T — 1, and 
rearranging terms, we have: 



TVU(r opX ) - TV Ty/32" - BT 

<v T ^Wu sd {^nt)) 

t=0 <- s,d 

Dividing both sides by TV, we get: 
U(r opt ) - B/V - 7]/32™ 

T-l 



E 



L(0) 



t=o 



^E E E^(^" BP (*)) 



■ E 



L(0) 



/TV. 



Using Jensen's inequality, we have: 
<J(r opt ) - B/V - 77/32™ 



T-l 



<E^(^E e [^ bp w])+e 



t=0 



L(0) 



/TV. 



Taking a limit as T — > 00 and using the fact that E[L(0)1 < 
00, we get: 

U(r opt ) - B/V - 77/32™ < £ U sd (^ d BP ). (67) 



Here 7 G d BP is the average value of "f sc i(t) under G-BP, i.e., 



T-l 



A m jE« BP (# 



4=0 



Finally, recall that all the admission queues H s d(t) are stable, 
which implies 7^ BP < rf^ BP . Therefore, 

U(r opt ) - B/V - 77/32™ < £ [/ sd (r s G d BP ). (68) 

s . d 

This completes the proof of the theorem. ■ 
Appendix D - Proof of LemmaO 

We first prove Lemma |4] 

Proof: (Lemma |4} We prove the lemma by contradiction. 
Suppose the conclusion is not true. Then, for any finite 
constant M, there exists a time t such that Q(t) > M. 

Since lirriT-yoo ^ St=o -^W — 1 ~ ? 7 w i tn probability one, 
we see that for any finite starting time to and for any e > 0, 
there exists a time Tr e \ < 00 such that for any T > T( e ), 

, to+T-l 

^ 51 < + W -P-1- (69) 

t=t 

Now fix e = 77/2 and choose M = A max Tr n n\. Let t* be 
the time when Q(t*) > M and let tjq be the beginning of the 
busy period during which the event {Q(t*) > M} happens, 
i.e., Q(<q - 1) = 0, and for any time t £ [*$,**], > 0- 
We see that: 

t-i 

Q(t)= ^ fl(r)-(t-^-l), 



T = t* 
t*-l 



!)• 



Since M = A max T( ri /2), we must have t* — t$ — 1 > Ti v / 2 ), 



for otherwise 53*=t* ^( r ) — ^ n ^is case, using (69i and 



e = 77/2, we have with probability 1 that: 

t*—i 



< (1 - 77/2)(r - ts - 1) - (** - <o - 1) < 0. 

This contradicts the fact that Q(t*) > M. Hence, Q(i) < M 
with probability 1 and Q(t) is stable. ■ 

Appendix E - Proof of Theorem[3] 

Now we prove Theorem [3] 

Proof: (Theorem [3]l We use induction to prove the theo- 
rem. The idea is to construct a feasible rate allocation profile 
that balances the input and output rates for each switch module 
in the fictitious network. 

We first show that the result holds for a 4 x 4 Benes network. 
In this case, the fictitious network is shown in Fig. [10] 



s 2 n-\ 



s 3 cm 
s 4 i I 



Fig. 10. The fictitious network for a 4 X 4 Benes network. 

Suppose the arrival rate vector is r = (r sd , Vs,d) € A 2 = 
{ r l Y,t=i r sd < 1, Y,d=i r sd < 1, Vs,d}. We construct a 
stabilizing rate allocation profile fi(r) as follows: 

t, = E rs ' d ' = E rs ^' Vsi = lj2 ' 



d=3,4 



r , Sim 2 



f^mi ,7713 f^mi ,7774 



U _ U 

^777,2 ,777,3 /^777-2, 7774 



f^m,2 ,777,3 /^777,2 ,777-4 



E rs ' d ' ^4m 2 = E rs ^' Vsi = 3 ' 4 ' 

d=l,2 d=3,4 

U _ U _ 1 \ ^ / , \ 

mi,m 3 — Mmi,m 4 — 2 / < *- ric ' r 2d7i 
<i=l,2 

= 2 E ( r W + r 2rf), 
d=3,4 

= 2 E ( r 3d + rid), 
d=l,2 

= 2 E ( 7, 3<i + r 4rf), 
d=3,4 

U _ U _ 1 \ " \ ' 

d=l,2 s 

Mm 3 ,Dj = Mto 4 ,D2 = o E E rsd ' 
d=3,4 s 

Since r e A2, it can be verified that /x(r) satisfies all the 
constraints ( |37| ) - ( |40| . Hence, it is a stabilizing rate allocation 
profile. This proves the 4x4 case. 

Now suppose the result holds for the 2™ _1 x 2™ _1 Benes 
network, we show that it also holds for the 2" x 2™ Benes 
network B n . To do so, let r G A„ denote the input vector 
to B n and we construct a stabilizing rate allocation profile as 
follows. 
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First, for each input server s, we let 

E Tsd 



^s,m(s) 



^s,m(s) 



d<2"- 1 

Then, for each m € Ci, we let: 

1 

2 



E 



(70) 



r"m,mi 



E 



-l,2i, 



E 

,} d<2™- 



(71) 



E 



_ _ (72) 

~ s£{2i m -l,2i m } d>2"- 1 

Here s = 2i m — 1 and 2i m are the input servers that connect 
to m. Note that (71 1 and (72i can also be viewed as equally 



splitting the traffic of each flow going through m £ C\ to its 
two next-hop nodes m u and mi, one in the upper subnetwork 
and the other in the lower subnetwork. We thus take these 
as the traffic input rates to the two subnetworks of the Benes 
network. 

Now consider the upper subnetwork and label all the input 
and output ports of the upper subnetwork by s' £ {1, 2™ -1 } 
and d! £ {1, 2™ -1 }. According to the construction rules of 
B„ in Section III-A| an input port s' is connected by the switch 



module in row s' in C\ of B n ; while an outport dl connects 
to the switch module in row d! in C^n-i of B„. These imply 
that the traffic going from input port s' to output port d' in the 
upper 2" _1 x 2 n_1 subnetwork indeed consists of the traffic 
going from input ports 2s' — 1 and 2s' to 2d' — 1 and 2d' 
in B n . Denote the rate of this traffic by f s >d>- Using (71 1 and 
( |72) , we have: 

1 



r s 'd> 



Hence, we have: 



r (2s'-l)(2d'-l) -|-^(2s'-l)(2d') 



+^(2s')(2d'-l) + »*(2s')(2d') 



(73) 



, r s > d > 



s(2d'-l) 



' s(2d')> 



< 1. 



s' ~ s 

Similarly, we have: 

E fs ' d ' = 9 E^ 25 '- 1 )^ + r (2s')d) < 1 



(74) 



(75) 



(74 1 and |75] ) thus imply that r £ A„_i. Hence, by in- 
duction, there exists a stabilizing rate allocation /i up (r) = 
i^mm' > Mm"™' ' Vto,to') that serves the arrival rate vector r 
within the upper 2 n_1 x 2™ _1 subnetwork in a symmetric 
manner, i.e., satisfies ( [42) . Similarly, one can show that 
there exists a balanced stabilizing rate allocation fi° w (r) = 
(Mm'm" A'm™" Vm,m') for the lower subnetwork. 

Now a stabilizing rate allocation profile for B„ can be 
constructed as follows: 



» For an input server s £ S, we use m ( s ) ar, d 
as in ( |70] >. 

• For a switch module m £ C\, we use /i„ mii , /^m mi , 

Mm,m„> and as in <0 and 

• For the switch modules in the upper subnetwork, use 

/i up (f); for the switch modules in the lower subnetwork, 

~ low / « \ 

use fi (r). 

It can be verified that this rate vector satisfies all the constraints 
(|3~7]i - (40 1, and thus is a stabilizing rate allocation vector for 



B n . By induction, this proves the theorem. ■ 
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